Pattern Discovery in Time-oriented Data
نویسندگان
چکیده
We present a data mining system, EasyMiner which has been developed for interactive mining of interesting patterns in time-oriented databases. This system implements a wide spectrum of data mining functions, including generalisation, characterisation, classification, association and relevant analysis. By enhancing several interesting data mining techniques, including attribute induction and association rule mining to handle time-oriented data the system provide a user friendly, interactive data mining environment with good performance. These algorithms were tested on time-oriented medical data and experimental results show that the algorithms are efficient and effective for discovery of pattern in databases. INTRODUCTION Knowledge Discovery in Databases (KDD) is the effort to understand, analyse, and eventually make use of the huge volume of data available. According to Fayyad et al. [1] KDD is the non trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. In their opinion, there are usually many steps in a KDD process including selection, pre-processing, transformation, data mining, and interpretation/ evaluation of the results as shown in Figure 1. As seen in the Figure data mining is only one step of the process, involving the application of discovery tools to find interesting patterns from targeted data, but in the research community most often the term data mining and KDD have been used interchangeably. Finding patterns in databases is the fundamental operation behind several common data mining tasks, including association rule [2] and sequential pattern mining[3]. Data mining is the process of applying machine learning and other techniques to classical databases in order to extract implicit, previously unknown and potential useful patterns from database [4]. Time is an important aspect of all real-world phenomena. Conventional databases model an enterprise as it changes dynamically by a snapshot at a particular point in time. As information is updated in a conventional database, its old, out-of-date data is discarded forever, its changes over time are thus lost. But in many situations, this snapshot-type of database is inadequate. They cannot handle queries related to any historical data. For many applications such as accounting, banking, GIS systems and medical data the changes made their databases over time are a valuable source of information which can direct their future operation. The pattern discovered from conventional databases has limited value since the temporal nature of data is not taken into account but only the current or latest snapshot. In the rest of the paper the data mining techniques used by Easy Miner for discovering interesting patterns from time-oriented database are described. In section 2 relevance analysis method is presented with examples from medical domain. In section 3 association rules mining technique is presented and section 4 describes Easy Miner approach to pattern discovery by mining classification rules. Section 5 concludes with a summary of the paper and outline of the future work. Figure 1. The knowledge discovery process 1 Relevance Analysis As we know from real life, several facts are relevant with each other and there is a strong dependence between them, for example Age is relevant to Date of Birth, Title(Mr, Mrs, Miss) is relevant to Sex and Marital Status. This kind of knowledge is qualitative and it is quite useful to mine it from large databases that hold information about many objects(fields). For example a bank could look in its data and identify which are the factors that it should take in mind in order to give a Credit Limit to a customer. As we found by using EasyMiner Credit Limit is relevant to Account Status, Monthly Expenses, Marital Status, Monthly Income, Sex and etc. A number of statistical and machine learning techniques for relevance analysis have been proposed until now. The most popular and acceptable approach in the data mining community, is the method of measuring the uncertainty coefficient[4]. Let us suppose that the generated set from the collection of task relevant data is a set P of p data records. Suppose also, that the interesting attribute has m distinct values defining by this way m Knowledge DATA Selection Target data Preprocessing Data Mining Transformation PATTERNS Interpretation/ Evaluation Preprocessed data
منابع مشابه
Mind the Time: Unleashing the Temporal Aspects in Pattern Discovery
Temporal Data Mining is a core concept of Knowledge Discovery in Databases handling time-oriented data. Stateof-the-art methods are capable of preserving the temporal order of events as well as the information in between. The temporal nature of the events themselves, however, can likely be misinterpreted by current algorithms. We present a new definition of the temporal aspects of events and ex...
متن کاملMind the time: Unleashing temporal aspects in pattern discovery
Temporal Data Mining is a core concept of Knowledge Discovery in Databases handling time-oriented data. State-of-the-art methods are capable of preserving the temporal order of events as well as the temporal intervals in between. The temporal characteristics of the events themselves, however, can likely lead to numerous uninteresting patterns found by current approaches. We present a new defini...
متن کاملOn the Design of Knowledge Discovery Services Design Patterns and Their Application in a Use Case Implementation
As service-orientation becomes more and more popular in the computer science community, more research is done in applying service-oriented applications in specific fields, including knowledge discovery. In this paper we investigate how the service-oriented paradigm can benefit knowledge discovery, and how specific services and the knowledge discovery process as a whole should be designed. We pr...
متن کاملWeighted-HR: An Improved Hierarchical Grid Resource Discovery
Grid computing environments include heterogeneous resources shared by a large number of computers to handle the data and process intensive applications. In these environments, the required resources must be accessible for Grid applications on demand, which makes the resource discovery as a critical service. In recent years, various techniques are proposed to index and discover the Grid resource...
متن کاملA Novel Technique for Pattern Extraction in Mixed Data
Knowledge discovery in databases or data mining is an important issue in the development of data and knowledge base system. The Self Organizing Map (SOM) is a vector quantization method which places the prototype vectors on a regular lowdimensional grid in an ordered fashion. Clustering data and extracting patterns from the clusters are very important tasks in data mining. An attribute-oriented...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008